5 research outputs found

    Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference

    Full text link
    Vision Transformers (ViTs) have shown impressive performance but still require a high computation cost as compared to convolutional neural networks (CNNs), one reason is that ViTs' attention measures global similarities and thus has a quadratic complexity with the number of input tokens. Existing efficient ViTs adopt local attention (e.g., Swin) or linear attention (e.g., Performer), which sacrifice ViTs' capabilities of capturing either global or local context. In this work, we ask an important research question: Can ViTs learn both global and local context while being more efficient during inference? To this end, we propose a framework called Castling-ViT, which trains ViTs using both linear-angular attention and masked softmax-based quadratic attention, but then switches to having only linear angular attention during ViT inference. Our Castling-ViT leverages angular kernels to measure the similarities between queries and keys via spectral angles. And we further simplify it with two techniques: (1) a novel linear-angular attention mechanism: we decompose the angular kernels into linear terms and high-order residuals, and only keep the linear terms; and (2) we adopt two parameterized modules to approximate high-order residuals: a depthwise convolution and an auxiliary masked softmax attention to help learn both global and local information, where the masks for softmax attention are regularized to gradually become zeros and thus incur no overhead during ViT inference. Extensive experiments and ablation studies on three tasks consistently validate the effectiveness of the proposed Castling-ViT, e.g., achieving up to a 1.8% higher accuracy or 40% MACs reduction on ImageNet classification and 1.2 higher mAP on COCO detection under comparable FLOPs, as compared to ViTs with vanilla softmax-based attentions.Comment: CVPR 202

    An approach for medical event detection in Chinese clinical notes of electronic health records

    No full text
    Abstract Background Medical event detection in narrative clinical notes of electronic health records (EHRs) is a task designed for reading text and extracting information. Most of the previous work of medical event detection treats the task as extracting concepts at word granularity, which omits the overall structural information of the clinical notes. In this work, we treat each clinical note as a sequence of short sentences and propose an end-to-end deep neural network framework. Methods We redefined the task as a sequence labelling task at short sentence granularity, and proposed a novel tag system correspondingly. The dataset were derived from a third-level grade-A hospital, consisting of 2000 annotated clinical notes according to our proposed tag system. The proposed end-to-end deep neural network framework consists of a feature extractor and a sequence labeller, and we explored different implementations respectively. We additionally proposed a smoothed Viterbi decoder as sequence labeller without additional parameter training, which can be a good alternative to conditional random field (CRF) when computing resources are limited. Results Our sequence labelling models were compared to four baselines which treat the task as text classification of short sentences. Experimental results showed that our approach significantly outperforms the baselines. The best result was obtained by using the convolutional neural networks (CNNs) feature extractor and the sequential CRF sequence labeller, achieving an accuracy of 92.6%. Our proposed smoothed Viterbi decoder achieved a comparable accuracy of 90.07% with reduced training parameters, and brought more balanced performance across all categories, which means better generalization ability. Conclusions Evaluated on our annotated dataset, the comparison results demonstrated the effectiveness of our approach for medical event detection in Chinese clinical notes of EHRs. The best feature extractor is the CNNs feature extractor, and the best sequence labeller is the sequential CRF decoder. And it was empirically verified that our proposed smoothed Viterbi decoder could bring better generalization ability while achieving comparable performance to the sequential CRF decoder

    Modeling and Parameter Sensitivity Improvement in ΔE-Effect Magnetic Sensor Based on Mode Localization Effect

    No full text
    A mode-localized ΔE-effect magnetic sensor model is established theoretically and numerically. Based on the designed weakly coupled resonators with multi-layer film structure, it is investigated how the ΔE-effect of the magnetostrictive film under the external magnetic field causes the stiffness perturbation of the coupled resonators to induce the mode localization effect. Using the amplitude ratio (AR) as the output in the mode-localized ΔE-effect magnetic sensor can improve the relative sensitivity by three orders of magnitude compared with the traditional frequency output, which has been verified by simulations based on the finite element method (FEM). In addition, the effects of material properties and geometric dimensions on sensor performance parameters, such as sensitivity, linear range, and static operating point are also analyzed and studied in detail, providing the theoretical basis for the design and optimization of the mode-localized ΔE-effect magnetic sensor in different application scenarios. By reasonably optimizing the key parameters of the weekly coupled resonators, a mode-localized ΔE-effect magnetic sensor with the sensitivity of 18 AR/mT and a linear range of 0.8 mT can be achieved
    corecore